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I. INTRODUCTION 


In the last decades, with the constant evolution of 
several technologies, it has been possible to build several 
highly complex machines and structures. With these 
advances, some actions are necessary to guarantee the 
the equipment or 


proper functioning of 


construction of some structure, such as, for example, more 
prepared workers/operators, raw material with better 
quality and more complex maintenance, and failures that 
generally can generate significant financial losses or in 


more severe cases, loss of life [8]. 


In corrective maintenance, the repair of the structure is 
done immediately, implying the stoppage of the same to 
carry out the maintenance. This type of maintenance 
corresponds to a reaction to events that have occurred. 
Therefore, in the event of an unforeseen anomaly, the 
production of a machine or the use of a particular structure 
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Abstract — In this work, three machine learning approaches were 
evaluated for detecting anomalies in impedance-based structural health 
monitoring (ISHM — Impedance-based Structural Health Monitoring) of a 
specimen in a controlled environment. Supervised, unsupervised, and 
semi-supervised algorithms were chosen to compare them regarding 
detecting anomalies in an aluminum beam with failure induced by surface 
machining on one of the faces. After applying the algorithms, it was found 
that, of the three types of learning, supervised and semi-supervised were 
the ones that achieved the best accuracy in detecting anomalies. On the 
other hand, the unsupervised type model did not obtain good results for 
the conditions investigated. Thus, this can be an important technique 
comparison achievement for implementing real anomaly detection ISHM 
systems. 


is stopped immediately, which can lead to a significant 
loss of time, high financial losses, and, depending on the 
severity of the defect, loss of life [9]. 


One of the several methods used in predictive 
maintenance is Structural Health Monitoring based on 
Electromechanical Impedance — ISHM. This technique 
aims to identify the properties of the structure about the 
occurrence of anomalies. Using the direct and inverse 
properties of certain piezoelectric materials, the method 
consists of fixing a sensor in the form of a PZT patch 
(Lead Zirconate Titanate) in the structure, which, after 
being excited at high frequency, around of 30kHz, causes 
the structure to undergo deformation, which consequently 
causes vibrations in the structure [3]. 


On the other hand, numerous machine learning techniques 
have been constantly developed to solve increasingly 
complex problems. As an example, there are anomaly 
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detection models that, with recent technological advances, 
have significantly impacted the maintenance sector [11]. 


In this way, the purpose of this work is to use anomaly 
analysis models based on supervised, unsupervised, and 
semi-supervised machine learning methods, with data 
collected from the monitoring of the structural integrity 
made in an aluminum beam to identify faults that were 
imposed on the surface of the specimen. 


I. METHODOLOGY 


Piezoelectric materials are widely used in the 
implementation of SHM due to their direct and inverse 
properties, the best known being Lead Zirconate Titanate 
(PZT - Pb-Lead Zirconate Titanate). In the inverse effect, 
changes occur in its dimensions concerning the application 
of the electrical potential difference, expressed by equation 
(1), and this phenomenon is used as an actuator [10] [7]. 


D=éE+do, (1) 


where, D is the strain vector, € is the dielectric tensor of 
the material, E is the electric field vector, d is the 
piezoelectric voltage tensor, and Oo is the voltage vector 
[10] [7]. 

In the direct effect, changes in electrical properties 
occur due to mechanical deformations, expressed by 
equation (2), functioning as a sensor [10] [7]. 


e=so+dE, (2) 


where e is the strain tensor, and s is the elastic property of 
the piezoelectric material [10] [7]. 


In 1994, Liang, Sun, and Rogers published a work 
where a model capable of identifying the process of 
measuring electromechanical impedance for one degree of 
freedom was presented. Combining the functions of the 
mechanical impedance of the PZT patch, Z, and the 
mechanical impedance of the structure Z, the admittance 
function Y (inverse of impedance) is created, shown in 
equation (3) [5]. 


Wi L, 
h 


BEJ Z 
T 
E 
(Ess Z, -Z 


a a 


2 =E 
dY 2), (3) 


Y =i 


where, 7 is the output current of the PZT wafer, œ is the 


angular frequency, w, is the width of the PZT wafer, l 


a a 


is the length of the PZT wafer, his the thickness of the 


a 


=T 
PZT wafer, £33 is the complex dielectric constant of the 


PZT wafer at zero voltage, a. is the piezoelectric 
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—E 
constant and Y 22 is Young's complex modulus of the PZT 
wafer with zero electric fields [5]. 


With the piezoelectric sensor/actuator fixed in the 
structure, the monitoring measures the real part of the 
impedance signature (IS). Then, a comparison is made 
between the signatures of the structure in its natural state 
with the signatures resulting from damage to 
locate/measure the level of the problem. This comparison 
can be made using the damage metric, Root Mean Square 
Deviation (RMSD), shown by equation (4), or if it is 
necessary to grant a specific value about the distinction 
between two sets, the Coefficient Deviation index (CCD - 
Correlation Coefficient Deviation), shown by equation (5) 


[3]. 


> [Re(Z) - Re(Z°)]? 
> [Re(Z°)? 


RMSD = > (4) 


where Re(Z) represents the real part of the PZT 


measured under healthy conditions and Re(Z°) 
represents the real part of the signal to be compared [3]. 
CCD =1-CC, (5) 
CC is the correlation coefficient calculated using equation 
(6) [3]. 
1 


no. ¥[Re(Z)—Re(Z)][Re(Z")-Re(Z J], (6) 


CC = 


where O, is the standard deviation of the impedance 
signal measured under healthy conditions, Ozo is the 


standard deviation of the impedance signal to be 


= =0 
compared, Z and Z represent the mean values [3]. 


There are several machine learning techniques, which 
can be classified depending on specific attributions such as 
being trained with supervision (labels) or not; by the 
grouping of peers; whether they can learn quickly; whether 
they can compare known data points with new data points, 
and whether they detect patterns to create predictive 
models. There is no exclusivity in using these techniques; 
merging more than one technique is possible to arrive at 
the best solution [2]. 


As mentioned in Section 1, anomaly detection models 
based on supervised, unsupervised and semi-supervised 
machine learning techniques will be used, using the 
algorithms: Logistic Regression, Copula-based Anomaly 
Detection (COPOD - Copula- Based Outlier Detection), 
and Local Outlier Factor (LOF). 


Starting with the supervised type model, the Logistic 
Regression algorithm aims to separate two classes, the 
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inliers (non-atypical data) and outliers (atypical data) sets. 
Using this technique, it is possible to make the relationship 
between the dependent variables, also known as labels (y), 
such as (0 or 1), with the independent variables (X) [6]. 
Equation (7) represents logistic regression. 


1 
PQO =1)= Le Bot XB, Xn)? 7) 


nn 


where, X represents the independent variables (model 
inputs) and B are estimated by the maximum likelihood 
method, which aims to maximize the probability that the 
data set has been observed [6]. 


Varying the values of X, it is possible to observe that 
the curve behaves in the shape of the letter "S", shown in 
Fig. 1, reaching a high degree of generalization [6]. 


0.8 
0.6 
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-3 =2 -1 0 1 2 3 


Fig. 1: Logistic Regression Curve. 


For the development of an unsupervised model, the 
COPOD algorithm was used, which is based on empirical 
cumulative distribution functions (ECDFs), whose format 
is given according to equation (8) [4]. 


1 n 
F(x) =P= D(X, Sx), 8) 


Where X is a d-dimensional dataset and n is the number of 
observations. This algorithm is based on copulas, which 
are functions to separate distributions of systems 
dependent on a multivariate arrangement, shown by 
equation (9) [4]. 


C,@) =PU, < uj, Ug Sua), (9) 


where, U is a random vector and u € [0, 1]. Since this 
algorithm is deterministic, there is no need to create 
hyperparameters (parameters to be defined before 
training). Therefore, problems with possible biases can be 
avoided. Regarding efficiency, the COPOD algorithm is 
one of the leading choices when the sample set has a high 
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dimension, quickly solving problems where the data have 
10000 attributes with 1000000 observations on a personal 
computer [4]. 


Regarding the construction of the anomaly detector, 
three steps are required. The first is to calculate the sample 


set’s left (F x), F) and right 
(E, F) tail ECDFs, using equation (8). The 


asymmetry vector b = [b,b a] is also calculated in 


this first step, shown by equation (10) [4]. 


1 n R 
> ~H) 


b, = 
1 n A2 
E 


where b is the standard formulation for estimating 
asymmetry and x is the input data. The second step is 
calculating the copula's empirical observations for each 
input set, shown by equation (11) [4]. 


(UiiesU ai) T (ECX i)e Fa (Xa) (11) 


3? (10) 


where U is the copula's empirical observations and X is the 
input data. With this calculation, we obtain the left tail and 


Vai = F,(x,), the right tail. Using the asymmetry 


equation to correct the empirical observations, we obtain 
Wai =U, if by <0, and otherwise W; = Va; [4]. 


The third and final step is calculating the scores from 
the previous step, where the maximum negative logarithm 
of the probability generated by the empirical copula of the 
left, right, and asymmetry-corrected tails are outliers [4]. 


The Local Outlier Factor (LOF) algorithm was used to 
develop a semi-supervised model where only the restricted 
neighborhood of each object is considered. For most 
objects in a cluster, their LOF is approximately equal to 1, 
and for other objects, a lower and upper bound is applied, 
where these bounds emphasize their local nature. The LOF 
of the object is based on the number of neighbors closest 
to the neighborhood location of this object, where this 
number will be used to find possible local anomalies [1]. 


With the number of nearest neighbors defined, it is 
possible to calculate the accessibility distance, the 
maximum distance between two points, shown in equation 
(12). 


reach _ dist, (p,o) = max{kdist(o),d(p,o)}, (12) 


where k is a natural number and p is an object related to 
object o. If the object p is within the neighborhood k of 
object o, then the reachability distance will be k. 
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Otherwise, it will be the distance between p and o [1]. Fig. 
2 illustrates this step. 


reach_dist;.{ Pj, 0) 
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Fig. 2: Reachability Density. 


Then the Local Reachability Density (LRD) is 
calculated. Here, two parameters are considered that define 
the density, the first is MinPts, which is the minimum 
number of objects, and the second is the volume. Equation 
(13) demonstrates this calculation, which uses the 
parameter reach_dist as a measure of importance [1]. 


1 
Yo E N vines Preach _ dist yinp,s) f (13) 
|N sines CP) | 


where LRD is the local reachability density and p is the 
object to be compared with the object o. Finally, the 
parameter calculated in equation (13) is compared with 
other neighbors, shown in equation (14) [1]. 


LRD winp;s (0) 
= oE N vines (DOCA 
LRD inps (P) (14) 


|N wien CP) | l 


LRD yinpis(P) = 


LOF yinpts(P) = 


The purpose of equation (14) is to evaluate possible 
novelties and test new elements in the set [1]. 


In two conditions, an aluminum beam 500 mm long, 38 
mm wide, and 3.2 mm thick was used for the proposed 
experiment. The first, without any damage, is called the 
baseline, and the second, on one of the faces of the beam, a 
superficial machining with 30 mm of width was made to 
simulate damages. The experiments were conducted at the 
LMEst (Structural Mechanics Laboratory) at FEMEC- 
UFU. Fig. 3 demonstrates this step. 
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Fig. 3: surface machining. 


At 70 mm on the opposite side of where the machining 
took place, a PZT patch of 1 mm thick and 20 mm in 
diameter was coupled to 100 mm from the edge of the 
structure to measure the impedance data of the specimen in 
the two conditions already mentioned, shown in Fig. 4. 


Fig. 4: PZT patch coupled to the beam 


As shown in Fig. 3 and Fig. 4, two supports were built 
for the beam using a 3d printer and inserted on each side 
so that the specimen is adequately positioned and does not 
suffer interference from the base. Finally, the whole set 
can be seen in Fig. 5. 


Fig. 5: Beam used in the analysis 


For real-world problems, one of the main impact 
factors on computational resolutions is temperature, which 
in this experiment is the primary source of noise in the 
technique since thermal variation can induce structural 
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deformations and cause changes in the mechanical 
behavior of the piezoelectric material. Due to this, to 
simulate variations of this external factor, the aluminum 
beam was inserted in an EPL-4H climatic chamber of the 
Platinous series so that the data collection is more similar 
to real problems in different daily lives, as shown in Fig. 6. 


Fig. 6: EPL-4H temperature chamber. 


Three temperature levels were chosen in the chamber, 
ranging from 13°C to 22°C, rising every 3°C. After the 
impedance data collection was complete, the chamber 
increased the temperature and kept for 30 minutes. Later, 
stabilization and data collection occurred again. Thus, with 
three temperature sets, two baseline levels, seven damage 
levels, and 30 repetitions, 810 impedance signatures were 
collected. Therefore, there were 180 baselines and 630 
damaged signatures in this study. These data were 
collected by an acquisition board connected to the PZT 
patch and stored on a server close to the chamber. 


Ii. RESULTS AND DISCUSSIONS 
With the data collected and grouped in a spreadsheet 
file, it was possible to apply the algorithms (COPOD, 
LOF, and logistic regression) with the libraries, as follows, 
in the Python language, shown in Table 1. 


Table. 1: Libraries in Python 
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balanced. To test the classifier, 55 samples equivalent to 
baselines and 476 samples comparable to damages were 
used. It is essential to say that samples were chosen 
randomly. 


For training the COPOD unsupervised algorithm, 180 
samples from baselines were used with a contamination 
level of 0.25 to make the detector more judicious. For 
testing, 630 samples corresponding to all damage levels 
were used. Since this method is unsupervised, there is no 
need to split samples in training and test groups. 


For the training of the LOF algorithm in a semi- 
supervised way, 125 baselines and 88 damaged samples 
were used for training. Two other vital parameters are the 
number of neighbors, defined as 15, and the contamination 
level, defined as 0.5, with the same objective mentioned 
before. Table 2 shows the results of each model. 


Table. 2: Experimental Results 


Total Anomalies 
Model Amount of : 
Identified 
Samples 
ie 531 100% 
Regression 
COPOD 810 88.7% 
LOF 597 95.7% 


Model Library 
Logistic Regression Scikit-learn 
COPOD Pyod 
LOF Scikit-learn 


Starting with the supervised Logistic Regression 
algorithm, 125 impedance signatures from baseline | and 
154 signatures corresponding to damage levels 2 to 7 were 
used for training since the training of this model must 
occur with both classes, and the data should preferably be 
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As can be seen from Table 2, all anomaly identifiers 
techniques could be applied for failure prediction since 
they are actual data obtained experimentally from 
structures. 


Figures 7-9 are shown below, describing the 
confusion matrices for each model. 


Novelty Detection by COPOD Model 
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Fig. 7: Confusion Matrix - COPOD 
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According to Fig. 7, the COPOD model identified 
132 baseline cases correctly, while it was wrong (type I) in 
48 cases. However, in the most critical case, the correct 
detection of damage (anomaly), the algorithm found 559 
signatures correctly, missing 71 signatures (type II). This 
second type of error is more harmful, as 71 damage 
conditions were analyzed as a baseline. 


The semi-supervised LOF algorithm obtained a better 
result than COPOD, as shown in Fig. 8. 


Novelty Detection by Local Outlier Factor Model 


-500 


Baseline - 46 9 -400 
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- 100 
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Fig. 8: Confusion Matrix - LOF 


In this second method, type I and II errors were more 
minor, and the severe cases of false negatives were only 23 
concerning 519 true positives. 


Figure 9 shows the confusion matrix of the model 
based on Logistic Regression. 


Novelty Detection by Logistic Regression Model 
400 
Baseline - 55 o 


- 300 


- 200 


Real Values 


Damage - o 
-100 


Baseline Damage 
Predicted Values 


Fig. 9: Confusion Matrix — Logistic Regression 


Finally, the model with the best result obtained, 
perhaps as expected, is the supervised model in which 
there were no occurrences of both types I and II errors. 
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IV. CONCLUSION 


This contribution compared anomaly detection models 
based on three types of machine learning: supervised, 
unsupervised and semi-supervised, and Logistic 
Regression, COPOD, and LOF. After making a theoretical 
reference to impedance-based structural integrity 
monitoring (ISHM) as well as machine learning 
techniques, the problem to be solved was exposed. This 
issue consists of the possibility of identifying anomalies 
concerning a specimen with damage conducted on one of 
the faces. This object was moved to a climatic chamber to 
simulate temperature variations. 


Logistic Regression and LOF algorithms correctly 
identified all levels of anomalies. In the first model, due to 
the training occurring together with the label, it was 
possible to detect all the impedance signature points 
concerning the inferred damages in the beam through 
superficial machining. In the second model, semi- 
supervised learning is also known as novelty detection; in 
training, all data are considered baseline. After this step, 
the detector correctly identified all data's novelty about 
anomalies. 


The COPOD algorithm was the worst at detecting 
anomalies, with 88.7% accuracy. This result could be 
expected because it is an unsupervised technique and does 
not provide information for model training. However, in an 
actual structural monitoring condition, this level of 
accuracy can be significant and applied in monitoring 
systems with redundancy. 


The second result obtained through the semi- 
supervised LOF technique can be considered an excellent 
potential for actual application due to its 95.7% accuracy 
in damage detection. 


The best result, with 100% accuracy, was the 
supervised model based on Logistic Regression. However, 
in monitoring real structures, the training process is only 
sometimes possible, considering that the structure already 
starts in a lifetime behavior in use. Finally, all methods 
presented excellent application possibilities in fault 
detection, in addition to being relatively simple techniques 
to be implemented, including in microcontrollers, allowing 
the development of intelligent sensors for fault prediction. 
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